skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Stephen, H."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 24, 2026
  2. Biomedical knowledge graphs (KGs) encode rich, structured information critical for drug discovery tasks, but extracting meaningful insights from large-scale KGs remains challenging due to their complex structure. Existing biomedical subgraph retrieval methods are tailored for graph neural networks (GNNs), limiting compatibility with other paradigms, including large language models (LLMs). We introduce K-Paths, a model-agnostic retrieval framework that extracts structured, diverse, and biologically meaningful multi-hop paths from dense biomedical KGs. These paths enable prediction of unobserved drug-drug and drug-disease interactions, including those involving entities not seen during training, thus supporting inductive reasoning. K-Paths is training-free and employs a diversity-aware adaptation of Yen's algorithm to extract the K shortest loopless paths between entities in a query, prioritizing biologically relevant and relationally diverse connections. These paths serve as concise, interpretable reasoning chains that can be directly integrated with LLMs or GNNs to improve generalization, accuracy, and enable explainable inference. Experiments on benchmark datasets show that K-Paths improves zero-shot reasoning across state-of-the-art LLMs. For instance, Tx-Gemma 27B improves by 19.8 and 4.0 F1 points on interaction severity prediction and drug repurposing tasks, respectively. Llama 70B achieves gains of 8.5 and 6.2 points on the same tasks. K-Paths also boosts the training efficiency of EmerGNN, a state-of-the-art GNN, by reducing the KG size by 90% while maintaining predictive performance. Beyond efficiency, K-Paths bridges the gap between KGs and LLMs, enabling scalable and explainable LLM-augmented scientific discovery. We release our code and the retrieved paths as a benchmark for inductive reasoning. 
    more » « less
    Free, publicly-accessible full text available August 3, 2026
  3. We present an implementation of the relativistic ionization-potential (IP) equation-of-motion coupled-cluster (EOMCC) with up to 3-hole–2-particle (3h2p) excitations that makes use of the molecular mean-field exact two-component framework and the full Dirac–Coulomb–Breit Hamiltonian. The closed-shell nature of the reference state in an X2C-IP-EOMCC calculation allows for accurate predictions of spin–orbit splittings in open-shell molecules without breaking degeneracies, as would occur in an excitation-energy EOMCC calculation carried out directly on an unrestricted open-shell reference. We apply X2C-IP-EOMCC to the ground and first excited states of the HCCX+ (X = Cl, Br, I) cations, where it is demonstrated that a large basis set (i.e., quadruple-zeta quality) and 3h2p correlation effects are necessary for accurate absolute energetics. The maximum error in calculated adiabatic IPs is on the order of 0.1 eV, whereas spin–orbit splittings themselves are accurate to ≈0.01 eV, as compared to experimentally obtained values. 
    more » « less
    Free, publicly-accessible full text available February 28, 2026
  4. Free, publicly-accessible full text available December 5, 2025
  5. Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying red cube by reasoning over the constituents red and cube. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating cube behind sphere from sphere behind cube). To inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We benchmark them on three synthetic datasets– singleobject, two-object, and relational– designed to test concept binding. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level. 
    more » « less
  6. The field of computer science has a problem of representation-many groups are not represented in our classroom at levels approaching their composition in society. Unfortunately, the representation issue is a larger societal issue and begins well before students enter our institutions. Though we acknowledge that building inclusive and equitable classroom environments cannot increase representation by itself, it can have an impact on retention and inclusion for members of marginalized communities. Current grading policies overemphasize the gaming aspect of points (e.g., goal is to maximize points) in ways that distract students from paying attention to learning. Alternatives to traditional grading, such as standards or competency-based grading, specifications-based grading, and ungrading, allow instructors to change the conversation and redirect the focus on learning. The goal of this Birds of a Feather is to foster the creation of a community of like-minded educators interested in exploring alternative grading methodologies in computer science. The goal is to make computing classrooms more accessible and equitable for all students. 
    more » « less